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1 Overview 

The trend to use Unicode [1] as the universal character encoding for information in- 
terchange has been growing stronger, and most software producers on the market have 
announced plans to adopt Unicode. I have therefore decided to propose the appropriate 
way to encode information in Latinized Taiwanese languages with Unicode. 

The Latinized Taiwanese languages, are 'Amis, Bunun, Hak-ka-fa, Ho-16-oe, Man- 
darin, Paiwan, Puyuma, Rukai, Saisiat, Tao, Tayal, Thao, Truku, and Tsou. Except for 
the case of Mandarin, where Hanyu Pinyin is considered, the Latinized orthographies 
I will be discussing are those found in the published Bible translations. Additional in- 
formation found in other published materials using the same or only slightly different 
Latinized orthographies are also considered. 

The most important parts of this article are sections 3, 4, 6 and 7, where char- 
acters not encoded in ISO 646 (ASCII) are used. Only such characters will be dis- 
cussed in these sections. 



2 'Amis, Tao, Tayal and Truku 

The Latinized forms of the languages 'Amis, Tao, Tayal and Truku are representable 
by the characters encoded by ISO 646 (ASCII). For the record, brief descriptions of the 
references are provided. 

2.1 'Amis 

The new 'Amis translation of the Bible is still being typeset (in T£X), but I have a 
descriptive introduction to the Latinization [2] that most likely will be used by this new 
translation. According to this introduction, all characters used are in ISO 646. 
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2.2 Tao 

All the characters in the Tao Bible [3] are ISO 646 characters. 

2.3 Tayal 

The Bible Society has told me that the Tayal Bible will soon be published. From 
a different source I have obtained an introductory description of the Latinization [4] 
published by the Presbyterian Church in Taiwan (Tai-oan Ki-tok Tiu n -16 Kau-hoe). 
This Latinization is most likely that used in the Bible. All characters used are ISO 646 
characters. 

2.4 Truku 

All characters used in the Truku Bible [5] are ISO 646 characters. 

3 Bunun 

The introduction to the Bunun Latinization [6] published by the Presbyterian Church 
and the Bible Society does not contain any non-ISO 646 characters, but the Hymnal [7] 
published by the Presbyterian Church contains two non-ISO 646 characters. 

D U+ 1 1 LATIN CAPITAL LETTER D WITH STROKE 
d U+ 1 1 1 LATIN SMALL LETTER D WITH STROKE 

The book has poor typesetting with all the strokes drawn with hand. The stroke 
on the small letter goes through the enclosure rather than then ascender, unlike the one 
shown here and in [1]. 

4 Hak-ka-fa and Ho-16-oe 

Hak-ka-fa and Ho-16-oe share the characters composed by one of the Latin letters "a", 
"e", "i", "o", "u", "m", and "n" with one of the combining diacritics acute, grave, cir- 
cumflex, macron (Ho-16-oe only) and vertical line above. Relevant Unicode characters 
are listed below. 



A 


U+0041 


E 


U+0045 


I 


U+0049 


M 


U+004D 


N 


U+004E 





U+004F 


U 


U+0055 


a 


U+0061 


e 


U+0065 


i 


U+0069 


m 


U+006D 


n 


U+006E 





U+006F 


u 


U+0075 


6 


U+0300 


6 


U+0301 


6 


U+0302 





U+0304 


6 


U+030D 



LATIN CAPITAL LETTER A 

LATIN CAPITAL LETTER E 

LATIN CAPITAL LETTER I 

LATIN CAPITAL LETTER M 

LATIN CAPITAL LETTER N 

LATIN CAPITAL LETTER O 

LATIN CAPITAL LETTER U 

LATIN CAPITAL LETTER A 

LATIN CAPITAL LETTER E 

LATIN CAPITAL LETTER I 

LATIN CAPITAL LETTER M 

LATIN CAPITAL LETTER N 

LATIN CAPITAL LETTER O 

LATIN CAPITAL LETTER U 

COMBINING GRAVE ACCENT 

COMBINING ACUTE ACCENT 

COMBINING CIRCUMFLEX ACCENT 

COMBINING MACRON 

COMBINING VERTICAL LINE ABOVE 

Unicode also has some precomposed characters used in both Hak-ka-fa and Ho-16- 
oe. They are listed below, not individually, but only in general categories with code 
ranges in order to save space. For an exhaustive, detailed list, see [8]. 

A, E, I, O, U, a, e, i, o or u + 6, 6 or 6 in the range U+00C0 -» U+OOFB 

A, E, I, O, U, a, e, i, o or u + o in the range U+ 1 -» U+016B 

M, m U+1E3E, U+1E3F 

N,h U+0143,U+0144 

4.1 Hak-ka-fa 

In addition to the characters listed above, the Hakka Bible [9] of the Taiwanese Si-yen 
dialect uses one other vowel, "u", and its combinations with acute, grave, circumflex, 
and vertical line above. 

o U+0324 COMBINING DIAERESIS BELOW 

U U+1E72 LATIN CAPITAL LETTER U WITH DIAERESIS BELOW 

U U+1E7 3 LATIN SMALL LETTER U WITH DIAERESIS BELOW 

PHANG Tet-siu tries to present the pronunciations in both major Taiwanese Hak-ka 
dialects (Si-yen and Hoi-liuk) with the same Latinization in his introduction [10] and 
his dictionary [11]. The most important character is the Hoi-liuk seventh tone, denoted 

by a COMBINING RING ABOVE. 

COMBINING RING ABOVE 

LATIN CAPITAL LETTER A WITH RING ABOVE 
LATIN CAPITAL LETTER U WITH RING ABOVE 
LATIN SMALL LETTER A WITH RING ABOVE 
LATIN SMALL LETTER U WITH RING ABOVE 

Other special character used by PhAng are listed below. 
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U+030A 










A 


U+00C5 


= 


U+0041 


+ 


U+0324 


U 


U+016E 


= 


U+0055 


+ 


U+0324 


a 


U+00E5 


= 


U+0061 


+ 


U+0324 


u 


U+016F 


— 


U+0075 


+ 


U+0324 



o U+0332 COMBINING LOW LINE 
O U+0323 COMBINING DOT BELOW 
O U+0325 COMBINING RING BELOW 

4.2 Ho-16-oe 

In addition to the characters shared with Hak-ka-fa, Ho-16-oe have 3 characters: capital 
and small "o - ", and " n " 

O' U+004F + U+00B7 

o- U+006F + U+00B7 

n U+207F SUPERSCRIPT LATIN SMALL LETTER N 

Typographically, the U+00B7 MIDDLE DOT should be properly raised, as it is 
here, to give the traditional appearance; some people may also prefer to have it kerned 
slightly. 

A superscript capital N, " N ", was used in all capital case in the dictionary of 1913 
[12] and the Bible of 1933 [13] 1 , but this was not found in published materials in the 
second half of the 20th century, including [14] and [15] 2 . 

The Bible of 1933 [13] also contained "wide diacritics" over both characters of the 
vowel "ng", for example, rig 3 . All other published materials I acquired presented this 
as "fig". 

5 Mandarin 

Since Unicode was designed with Hanyu Pinyin of Mandarin in mind, the presentation 
thereof could be easily deduced 4 ; thus a discussion here would be redundant. 

6 Paiwan 

The Paiwan Bible [16] and the introduction to Paiwan Latinization [17] contained the 

following non-ISO 646 characters, all being Unicode characters. 

COMBINING MACRON BELOW 
LATIN CAPITAL LETTER D WITH LINE BELOW 
LATIN SMALL LETTER D WITH LINE BELOW 
LATIN CAPITAL LETTER L WITH LINE BELOW 
LATIN SMALL LETTER L WITH LINE BELOW 
LATIN CAPITAL LETTER R WITH LINE BELOW 
LATIN SMALL LETTER R WITH LINE BELOW 
LATIN CAPITAL LETTER T WITH LINE BELOW 
LATIN SMALL LETTER T WITH LINE BELOW 

'For example, "Jl-SI N " [12, p. 93], and "SAT-BO'-JI N " [13, p. 305]. 
2 For example, "KO-IU"" [14, p. 1201], and "Kl-THA°" [15, Bok-liok]. 
3 "mrig-cheng", [13, II Liat-6ng-ki 23:8, p. 444]. 
4 See, for example, [1, pp. 7-26-7-27]. 
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U+0331 
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U+1E0E 





U+0044 


+ 


U+0331 
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U+1E0F 


-— 


U+0064 


+ 


U+0331 


L 


U+1E38 
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U+004C 


+ 


U+0331 


1 


U+1E39 


= 


U+006C 


+ 


U+0331 


R 


U+1E5E 


= 


U+0052 


+ 


U+0331 


r 


U+1E5F 


= 


U+0072 


+ 


U+0331 


T 


U+1E6E 


-— 


U+0054 
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U+0331 


t 


U+1E6F 


= 


U+0074 


+ 


U+0331 



7 Tsou 

No Bible translation is present for the Tsou language. I have obtained a Mandarin 
translation of the authoritative description of the Tsou language [18], and other books 
in the Tsou language by two Tsou experts Pu Zhongyong and Pu Zhongcheng. These 
books gave a Latinization used by the Christian church which used the following non- 
ISO 646 characters. 

I U+OOCF LATIN CAPITAL LETTER I WITH DIAERESIS 

i' U+OOEF LATIN SMALL LETTER I WITH DIAERESIS 

U U+0055 + U+0336 

U U+02 8 9 LATIN SMALL LETTER U BAR 

Although the capital letter u bar "U" is not found in the literature cited, it is concep- 
tually necessary, and is composed using the capital letter "U" and U+033 6 COMBIN- 
ING LONG STROKE OVERLAY. 



8 Puyuma, Rukai, Saisiat and Thao 

I have obtained very little to no information on the following four languages: Puyuma, 
Rukai, Saisiat, and Thao, mostly due to the lack of published materials and the absence 
of Bible translations in these languages, which could in turn be accounted by their 
extremely small population of native speakers. My contact at the Bible Society told me 
that the Bible in Rukai will be published soon, but there still has been no telling about 
the other three languages. Because of the absence of applicable information, I am not 
going to further discuss these four languages in this article. 
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